R for Demographic Data Analysis

R is a statistical programming language that can be used for everything from importing data to automating repetitive analysis workflows to creating publication ready data visualizations including maps. It can also be used to build interactive data visualizations and web apps.

This demo shows how to use R to pull data from the Census API, explore the data in table and map form, and export to a shapefile that can be used to create publication-ready maps in ArcGIS.

Loading required R packages

R comes preloaded with many useful functions for statistical analysis, but much more functionality is available through extensions called packages that are developed by other R users.

You only need to install each package once on your computer, but whenever you are writing code in R you will have to load the relevant packages with the library() function.

library(tidycensus)
library(tigris)
library(tidyverse)
library(sf)
library(mapview)

Each of these packages provides us with various helpful functions for geospatial data analysis:

  • tidycensus contains functions for downloading datasets directly from the Census Bureau API, including associated geometries.

  • tigris can be used to download census geometries without any attribute data (the boundary of Austin, for example).

  • tidyverse is actually a collection of several packages such as tidyr, dplyr, and ggplot2 that can be used for transforming, analyzing, and visualizing data.

  • sf is a package that allows essential GIS workflows to be executed in R.

  • mapview provides functions for generating interactive maps. Other packages exist for creating detailed maps, but this one is focused on creating quick and simple maps for data exploration.

Importing data

Census data with tidycensus

Before we import census data, it’s often helpful to define what data we need and save them as variables in R that can be referenced later in our code. Below we create two lists using c("item1", "item2",...).

First we define which demographic variables we want to download as a list called race_vars. We also define a list of all the counties in the Austin Metropolitan Statistical Area.

race_vars <- c(
  "% Hispanic or Latino" = "DP05_0073P",
  "% White" = "DP05_0079P",
  "% Black" = "DP05_0080P",
  "% AIAN" = "DP05_0081P",
  "% Asian" = "DP05_0082P",
  "% Pacific Islander" = "DP05_0083P",
  "% Other Race" = "DP05_0084P",
  "% Two or More Races" = "DP05_0085P"
)

austin_msa_counties <- c("Travis", "Hays", "Williamson","Caldwell","Bastrop")
Finding Variable Codes

It can be tricky to find the right variable codes, but it’s easy to look them up with the load_variables() function. You just have to provide the year and census dataset you’re interested in to generate a searchable list. In this case, I used view(load_variables(2022, "acs5/profile"))

Now it’s time to use tidycensus to actually import our data. The most commonly used functions are get_acs() and get_decennial() which both take the same set of inputs.

  • geography can be set to any of the census geographies such as state, county, tract, block, place, etc.

  • variable can be set to a single variable code, or a list of multiple variables like we created above. Alternatively, if you want all the variables in a table, you can replace variable with table and the appropriate table code.

  • output can be set to either wide or tidy. For multiple variables, wide will often make the most sense. If only pulling data for one variable (median income, for example), tidy is usually the best option. Experiment with both settings to see which works best for your needs.

Understanding R packages and functions

To see the full list of arguments for any function, as well as package documentation, type a ? followed by the package/function name into the R console. Example: ?get_acs

austin_race <- get_acs(
  geography = "tract",
  variables = race_vars,
  year = 2022,
  output = "wide",
  state = "TX",
  county = austin_msa_counties,
  geometry = TRUE,
  cb = FALSE,
  survey = "acs5"
)
Warning

Make sure to set cb = FALSE to get the same polygons as when downloading from the census website.

Import geographies with tigris

If you don’t need any tablular data and just want geographic layers from the census, you can use the tigris package to download any of the TIGER shapefiles. Below, for example, we download the geography for the City of Austin.

austin_boundary <- places(
  state = "TX"
)%>%
  filter(str_detect(NAME, "Austin"))

Exploring the data

Once you import the data with tidycensus, there are several simple functions you can use to explore the data variable we saved as austin_race.

Use the glimpse() function to see a summary of the dataset including number of rows, column names/types, and the first few values in each column.

glimpse(austin_race)
Rows: 503
Columns: 19
$ GEOID                   <chr> "48453002447", "48453040700", "48453041900", "…
$ NAME                    <chr> "Census Tract 24.47; Travis County; Texas", "C…
$ `% Hispanic or LatinoE` <dbl> 71.8, 80.9, 34.1, 24.4, 22.1, 40.5, 39.2, 10.0…
$ `% Hispanic or LatinoM` <dbl> 11.8, 6.9, 11.3, 7.5, 7.7, 14.5, 11.7, 2.2, 10…
$ `% WhiteE`              <dbl> 20.4, 11.9, 51.8, 56.6, 58.7, 27.1, 29.0, 76.8…
$ `% WhiteM`              <dbl> 9.1, 5.0, 11.4, 9.0, 9.1, 25.8, 13.8, 9.2, 9.4…
$ `% BlackE`              <dbl> 4.1, 4.8, 2.1, 8.7, 0.8, 19.6, 9.9, 5.4, 28.2,…
$ `% BlackM`              <dbl> 5.7, 4.8, 2.0, 6.3, 0.9, 21.2, 10.0, 7.1, 12.3…
$ `% AIANE`               <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
$ `% AIANM`               <dbl> 1.5, 0.7, 1.5, 3.1, 0.7, 0.9, 1.9, 1.2, 1.6, 0…
$ `% AsianE`              <dbl> 0.1, 1.7, 8.8, 6.5, 17.0, 2.4, 21.9, 7.0, 0.3,…
$ `% AsianM`              <dbl> 0.3, 1.8, 7.1, 5.1, 7.0, 2.1, 13.3, 4.0, 0.5, …
$ `% Pacific IslanderE`   <dbl> 0.0, 0.0, 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
$ `% Pacific IslanderM`   <dbl> 1.5, 0.7, 0.5, 3.1, 0.7, 0.9, 1.9, 1.2, 1.6, 1…
$ `% Other RaceE`         <dbl> 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0…
$ `% Other RaceM`         <dbl> 1.5, 0.7, 1.5, 0.7, 0.7, 0.7, 1.9, 1.2, 1.6, 1…
$ `% Two or More RacesE`  <dbl> 3.6, 0.6, 2.9, 3.5, 1.4, 10.0, 0.0, 0.9, 0.8, …
$ `% Two or More RacesM`  <dbl> 4.1, 0.8, 2.5, 2.5, 1.1, 12.7, 1.9, 1.0, 1.2, …
$ geometry                <POLYGON [°]> POLYGON ((-97.74044 30.2068..., POLYGO…

Using the view() function will open the data table in a new tab in RStudio, with standard table functions like sorting and filtering.

view(austin_race)

Finally, the mapview() function can be used to easily generate a simple interactive map of your data. Use zcol to define which column you want to visualize.

mapview(austin_race, zcol = "% Hispanic or LatinoE")

Exporting shapefiles for ArcGIS

Although R can be used to create interactive web maps, often we will want to simply use R for importing and cleaning the data, and then using an enterprise tool like ArcGIS Pro to create the final layers to publish.

The sf package contains the function st_write which can export our data into a wide variety of file formats. Here we export as a shapefile that can be easily opened in ArcGIS.